ImageSeer: Clustering and Searching WWW Images Using Link and Page Layout Analysis
نویسندگان
چکیده
Due to the rapid growth of the number of digital images on the Web, there is an increasing demand for effective and efficient method for organizing and retrieving the images available. This paper describes ImageSeer, a system for clustering and searching WWW images. By using a vision-based page segmentation algorithm, a web page is partitioned into blocks, and the textual and link information of an image can be accurately extracted within the block containing that image. The textual information is used for image representation. By extracting the page-to-block, blockto-image, block-to-page relationships through link structure and page layout analysis, we construct an image graph. Our method is less sensitive to noisy links than previous methods like PicASHOW, and hence the image graph can better reflect the semantic relationship between images. With the graph models, we use techniques from spectral graph theory and Markov Chain theory for image ranking, clustering and embedding. Some experimental results are given in the paper.
منابع مشابه
A Clustering-Based Algorithm for Automatic Document Separation
For text, audio, video, and still images, a number of projects have addressed the problem of estimating inter-object similarity and the related problem of finding transition, or ‘segmentation’ points in a stream of objects of the same media type. There has been relatively little work in this area for document images, which are typically text-intensive and contain a mixture of layout, text-based...
متن کاملA Survey Paper of Structure Mining Technique using Clustering and Ranking Algorithm
A survey of various link analysis and clustering algorithms such as Page Rank, Hyperlink-Induced Topic Search, Weighted Page Rank based on Visit of Links K-Means, Fuzzy K-Means. Ranking algorithms illustrated, Weighted Page Rank is more efficient than Hyperlink-induced Topic Search Whereas clustering algorithms has described Fuzzy Soft, Rough K-Means is a mixture of Rough K-Means and fuzzy soft...
متن کاملSearching the Greek WWW
The amount of information available on the WWW is vast and growing at a staggering rate. The result of this growth was that users were unable to explore the vast recourses of the Internet. The answer to that problem was the search engines. Internet search engines first appeared in the mid-90s, but have, in a few years, made themselves part of our everyday lives. Following the global trend greek...
متن کاملمرور مؤثر نتایج جستجوی تصاویر با تلخیص بصری و متنوع از طریق خوشهبندی
With unprecedented growth in production of digital images and use of multimedia references, requirement of image and subject search has been increased. Systematic processing of this information is a basic prerequisite for effective analysis, organization and management of it. Likewise, large collections of images have been made available on the Web and many search engines have provided the poss...
متن کاملUsing Computer Vision to Detect Web Browser Display Errors
As the functionality and complexity of the WWW continues to grow so does the need for WWW quality assurance and testing. Although there have been numerous approaches to automated Web testing, existing techniques analyze primarily textual information, and the final judgment on correctness of layout is via human observation. The motivation of this paper is to employ computer vision techniques to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004